Implement prism -> sorbet conversion for multi-statement programs #28

egiurleo · 2024-06-14T20:39:06Z

Sorbet constructs slightly different ASTs depending on whether a program contains one statement or more than one statements. Correctly parsing programs with more than one statement will make it easier to benchmark this project.

Motivation

Sorbet constructs slightly different ASTs depending on whether a program contains one statement or more than one statements. Correctly parsing programs with more than one statement will make it easier to benchmark this project.

Test plan

Added automated tests for parsing a multi-statement program.

Sorbet constructs slightly different ASTs depending on whether a program contains one statement or more than one statements. Correctly parsing programs with more than one statement will make it easier to benchmark this project.

egiurleo · 2024-06-14T20:53:21Z

main/pipeline/pipeline.cc

+            pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
+            pm_statements_node *stmts = programNode->statements;
+
+            auto size = stmts->body.size;
+
+            // For a single statement, do not create a Begin node and just return the statement
+            if (size == 1) {
+                return convertPrismToSorbet((pm_node *)stmts->body.nodes[0], parser, gs);
+            }
+
+            // For multiple statements, convert each statement and add them to the body of a Begin node
+            parser::NodeVec sorbetStmts;
+
+            for (int i = 0; i < stmts->body.size; i++) {
+                pm_node_t *node = stmts->body.nodes[i];
+                unique_ptr<parser::Node> convertedStmt = convertPrismToSorbet(node, parser, gs);
+                sorbetStmts.emplace_back(std::move(convertedStmt));
+            }
+
+            auto *loc = &programNode->base.location;
+
+            return make_unique<parser::Begin>(locOffset(loc, parser), std::move(sorbetStmts));


I'd love some advice on the implementation here -- I decided to handle all the statement logic in the program node case because Sorbet doesn't have a representation of statement nodes, it just stores them as a NodeVec (vector of nodes) in the body of a Begin node, which is the sorbet equivalent of program.

Probably not a huge deal because this is still a prototype but I'm trying to learn how to do things in C++ 😅

I think this is fine if you think adding a node will require too many changes.

My C++ comment would be to iterate using a range instead:
for (auto node : stmts->body.nodes) {}. It's cleaner and prevents range bugs. This version calls the copy constructor for node creation which may be inefficient and typed differently depending on what you need it for.

It's really good to know you can do that in C++! I actually can't iterate over a pm_node_list this way because it doesn't implement the begin function (I get the error Invalid range expression of type 'struct pm_node **'; no viable 'begin' function available). I can look into adding that to the Prism API, but for now I think this is the only way to iterate.

KaanOzkan · 2024-06-17T14:39:02Z

main/pipeline/pipeline.cc

+            pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
+            pm_statements_node *stmts = programNode->statements;
+
+            auto size = stmts->body.size;
+
+            // For a single statement, do not create a Begin node and just return the statement
+            if (size == 1) {
+                return convertPrismToSorbet((pm_node *)stmts->body.nodes[0], parser, gs);
+            }
+
+            // For multiple statements, convert each statement and add them to the body of a Begin node
+            parser::NodeVec sorbetStmts;
+
+            for (int i = 0; i < stmts->body.size; i++) {
+                pm_node_t *node = stmts->body.nodes[i];
+                unique_ptr<parser::Node> convertedStmt = convertPrismToSorbet(node, parser, gs);
+                sorbetStmts.emplace_back(std::move(convertedStmt));
+            }
+
+            auto *loc = &programNode->base.location;
+
+            return make_unique<parser::Begin>(locOffset(loc, parser), std::move(sorbetStmts));


I think this is fine if you think adding a node will require too many changes.

My C++ comment would be to iterate using a range instead:
for (auto node : stmts->body.nodes) {}. It's cleaner and prevents range bugs. This version calls the copy constructor for node creation which may be inefficient and typed differently depending on what you need it for.

main/pipeline/pipeline.cc

amomchilov · 2024-06-19T15:29:56Z

main/pipeline/pipeline.cc

+            pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);
+            pm_statements_node *stmts = programNode->statements;
+
+            auto size = stmts->body.size;


You can simplify this code a bit by wraping this raw C pointer and size into a C++ std::span. It's like a vector in that it'll let you use C++-style foreach loops, but it doesn't copy/own/free the buffer.

Suggested change

pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);

pm_statements_node *stmts = programNode->statements;

auto size = stmts->body.size;

pm_program_node *programNode = reinterpret_cast<pm_program_node *>(node);

pm_statements_node *stmts = programNode->statements;

std::span<pm_node_t *> nodes(stmts->body.nodes, stmts->body.size);

Then you can:

if (nodes.size() == 1) { return convertPrismToSorbet(nodes[0], parser, gs); }

for (auto node : nodes) { unique_ptr<parser::Node> convertedStmt = convertPrismToSorbet(node, parser, gs); sorbetStmts.emplace_back(std::move(convertedStmt)); }

Aw man, we're actually using C++17, which doesn't implement span 😭

That would be a real bummer, but luckily, we have absl::span, which is already used a fair bit throughout the codebase! 🥳

#34

egiurleo force-pushed the emily/parse-multi-statement branch 2 times, most recently from 826a67d to c0e0cb2 Compare June 14, 2024 20:51

egiurleo force-pushed the emily/parse-multi-statement branch from c0e0cb2 to 2e863c2 Compare June 14, 2024 20:51

egiurleo commented Jun 14, 2024

View reviewed changes

egiurleo marked this pull request as ready for review June 14, 2024 20:53

egiurleo requested review from Morriar, amomchilov and KaanOzkan June 14, 2024 20:54

egiurleo self-assigned this Jun 14, 2024

KaanOzkan approved these changes Jun 17, 2024

View reviewed changes

Morriar reviewed Jun 17, 2024

View reviewed changes

main/pipeline/pipeline.cc Show resolved Hide resolved

amomchilov reviewed Jun 19, 2024

View reviewed changes

egiurleo merged commit 2b83f0e into proj-parsing-w-prism-in-sorbet Jul 8, 2024
1 check passed

egiurleo deleted the emily/parse-multi-statement branch July 8, 2024 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement prism -> sorbet conversion for multi-statement programs #28

Implement prism -> sorbet conversion for multi-statement programs #28

egiurleo commented Jun 14, 2024

egiurleo Jun 14, 2024

KaanOzkan Jun 17, 2024

egiurleo Jul 8, 2024

KaanOzkan Jun 17, 2024

amomchilov Jun 19, 2024

egiurleo Jul 8, 2024

amomchilov Aug 1, 2024

Implement prism -> sorbet conversion for multi-statement programs #28

Implement prism -> sorbet conversion for multi-statement programs #28

Conversation

egiurleo commented Jun 14, 2024

Motivation

Test plan

egiurleo Jun 14, 2024

Choose a reason for hiding this comment

KaanOzkan Jun 17, 2024

Choose a reason for hiding this comment

egiurleo Jul 8, 2024

Choose a reason for hiding this comment

KaanOzkan Jun 17, 2024

Choose a reason for hiding this comment

amomchilov Jun 19, 2024

Choose a reason for hiding this comment

egiurleo Jul 8, 2024

Choose a reason for hiding this comment

amomchilov Aug 1, 2024

Choose a reason for hiding this comment